Skip to content

Conversation

@limbonaut
Copy link
Contributor

@limbonaut limbonaut commented May 23, 2025

This PR fixes corrupted line numbers for coroutines in the debugger and backtrace. I discovered this issue while testing #91006 for sentry-godot use case. The issue is described in detail here:

The changes are deliberately minimal -- just to fix this problem. Please let me know if you think we should take a different approach.

Fixes #106489

Explanation

Why it happens

  1. Function is entered, awaited, and therefore exited (we're not interested in details at this point).
  2. Later, the function is resumed with a function-state object (now we're interested in details).
  3. Inside call(): line variable is local, and enter_function() is passed a reference to this variable.
  4. enter_function() initializes CallLevel with a pointer to the line variable (which is scoped to call()).
  5. The line variable is continuously updated during GDScript function execution using OPCODE_LINE operation.
  6. The GDScript function execution ends, call() is exited, and line is popped off the stack, but the CallLevel instance still lives on due to specifics of the resumed function execution (exit_function() isn't called yet).
  7. The next function-state object is resumed via signal, and steps 2–6 repeat until no function-state objects remain in the resume chain.
  8. Resumed functions finally exit, but the CallLevel frames overstayed the lifetime of their line variables.

Conclusion: Because CallLevel outlives call()'s scope for resumed coroutines, CallLevel::line pointer can sometimes point to heaven-knows-what.

Hey @mihe, do you think this could be the reason why the initially proposed reversed linked-list call stack implementation crashed in coroutine unit tests? (#91006)

Explaining on example

Consider this code:

func _ready() -> void:
	await f1()
	for b in Engine.capture_script_backtraces(): print(b)

func f1():
	print("starting f1")
	await f2()
	print("finishing f1")

func f2():
	print("starting f2")
	await f3()
	print("finishing f2")

func f3():
	print("starting f3")
	await get_tree().process_frame
	print("finishing f3")

Output:

Note: Lines starting with "debug: " are print_line() statements I placed in the native code.

debug: call( @implicit_new ): entering
debug: enter_function( @implicit_new )    [ call_level.line: 0 ]
debug: exit_function( @implicit_new )    [ call_level.line: 0 ]
debug: call( @implicit_new ): exiting
debug: call( _ready ): entering
debug: enter_function( _ready )    [ call_level.line: 4 ]
debug: call( f1 ): entering
debug: enter_function( f1 )    [ call_level.line: 9 ]
starting f1
debug: call( f2 ): entering
debug: enter_function( f2 )    [ call_level.line: 15 ]
starting f2
debug: call( f3 ): entering
debug: enter_function( f3 )    [ call_level.line: 21 ]
starting f3
debug: exit_function( f3 )    [ call_level.line: 23 ]
debug: call( f3 ): exiting
debug: exit_function( f2 )    [ call_level.line: 17 ]
debug: call( f2 ): exiting
debug: exit_function( f1 )    [ call_level.line: 11 ]
debug: call( f1 ): exiting
debug: exit_function( _ready )    [ call_level.line: 5 ]
debug: call( _ready ): exiting
debug: call( f3 ): entering
debug: enter_function( f3 )    [ call_level.line: 23 ]
finishing f3
debug: call( f3 ): exiting
debug: call( f2 ): entering
debug: enter_function( f2 )    [ call_level.line: 17 ]
finishing f2
debug: call( f2 ): exiting
debug: call( f1 ): entering
debug: enter_function( f1 )    [ call_level.line: 11 ]
finishing f1
debug: call( f1 ): exiting
debug: call( _ready ): entering
debug: enter_function( _ready )    [ call_level.line: 5 ]
GDScript backtrace (most recent call first):
    [0] _ready (res://main.gd:6)
    [1] f1 (res://main.gd:32765)
    [2] f2 (res://main.gd:32765)
    [3] f3 (res://main.gd:32765)
debug: call( _ready ): exiting
debug: exit_function( _ready )    [ call_level.line: 6 ]
debug: exit_function( f1 )    [ call_level.line: 32765 ]
debug: exit_function( f2 )    [ call_level.line: 32765 ]
debug: exit_function( f3 )    [ call_level.line: 32765 ]

Notice that the backtrace reports crazy line numbers, and how call(f1) already exited, but the associated CallLevel frame lingers slightly longer in the end (exit_function). This only happens for resumed functions.

@limbonaut limbonaut requested a review from a team as a code owner May 23, 2025 18:09
@mihe
Copy link
Contributor

mihe commented May 23, 2025

Hey @mihe, do you think this could be the reason why the initially proposed reversed linked-list call stack implementation crashed in coroutine unit tests?

I don't remember the exact details of it, but it was a similar story to this, yes, except the entire CallLevel was the thing getting popped off the stack, thus blowing everything up.

What I don't quite understand is why line is a pointer at all. Why not just store it in CallLevel by-value and skirt around this whole problem?

Also, I was about to ask why ip isn't suffering from the same problem, but I see now that it's entirely unused, so should probably be removed.

@limbonaut
Copy link
Contributor Author

I think it's passed by reference, because it's continuously updated via OPCODE_LINE instructions. So initially it starts at the beginning of the function, and is incremented each time a statement is executed. And CallLevel doesn't exist in default release builds, right? Unless you enable it in the project settings.

@dalexeev dalexeev added this to the 4.5 milestone May 23, 2025
@limbonaut
Copy link
Contributor Author

I'm looking into making the reversed linked list call stack to work -- it was originally proposed by @reduz in #91006. Now that I have figured out this problem, I think it may work well with some adjustments.

limbonaut added a commit to limbonaut/godot that referenced this pull request May 25, 2025
* GDScript call stack as reverse linked list with issues fixed
(originally proposed in godotengineGH-91006).
* This makes it fast enough to leave it enabled all the time, so this
also drops "debug/settings/gdscript/always_track_call_stacks" setting.
* Fix coroutine issues with call stack by resuming async call chain
inside `GDScriptFunction::call()`.
* This fixes corrupted line numbers for coroutines in the debugger and
backtrace (godotengineGH-106758).

Co-authored-by: Juan Linietsky <[email protected]>
@limbonaut
Copy link
Contributor Author

@limbonaut limbonaut closed this May 25, 2025
@limbonaut limbonaut deleted the fix-coroutine-line-numbers branch May 25, 2025 09:17
@AThousandShips AThousandShips removed this from the 4.5 milestone May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect line numbers for coroutines in GDScript stack traces

4 participants